Skip to content

Releases: huggingface/optimum-habana

v1.17.0: Transformers v4.49

14 Apr 16:34
Compare
Choose a tag to compare

Transformers v4.49

This release has been tested and validated for Transformers v4.49 and SynapseAI v1.20.

Model optimizations

Tests and CI

Other

  • Disable HPU migration (future add-on to HF diffusers) for OH diffusers #1866 @dsocek
  • Allow explicit control over flash_attention_fast_softmax setting #1851 @astachowiczhabana

v1.16.0: Deepseek V3, SynapseAI v1.20, Llama 405b, AWQ

12 Mar 09:57
Compare
Choose a tag to compare

SynapseAI v1.20

This release has been tested on and validated for SynapseAI v1.20.

New models

Llama 405b

AWQ

Various model optimizations

Sentence Transformers

CI

  • Implement baselines as a fixture and with simple rebase support #1732 @uartie

Other

v1.15.0: SynapseAI v1.19.0, FLUX, Mllama, DeepSeek, Falcon 3

02 Jan 11:36
Compare
Choose a tag to compare

SynapseAI v1.19

FLUX

New models

Various model optimizations

Sentence Transformers

Textual Inversion XL

TIMM

Context Parallelism

CI improvements

Documentation

Other

v1.14.1: Patch release

29 Oct 17:13
Compare
Choose a tag to compare

Full Changelog: v1.14.0...v1.14.1

v1.14.0: Transformers v4.45, SynapseAI v1.18, Qwen2-MoE, text-to-video generation

22 Oct 16:11
Compare
Choose a tag to compare

Transformers v4.45

SynapseAI v1.18

Qwen2-MoE

  • Added Qwen2-MoE model, optimizing its performance on Gaudi #1316 @gyou2021

Text-to-video generation

Depth-to-image generation

Model optimizations

Intel Neural Compressor

  • Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC #1325 @tthakkal
  • Load INC GPTQ checkpoint & rename params #1364 @HolyFalafel
  • Fix load INC load weights compile error due to Transformer 4.45 upgrade. #1421 @jiminha

Vera/LN-tuning

Other

v1.13.2: Patch release

06 Sep 20:17
Compare
Choose a tag to compare

Llava(-next) improvements

This patch release adds multi-card support for Llava(-next) and enables users to turn on/off recomputing for flash attention.

  • Llava: Added flash_attention_recompute arg to provide an option to enable/disable recompute #1278 @tthakkal
  • Add the deepspeed injection_policy of mistral #1309 @yuanwu2017

Full Changelog: v1.13.1...v1.13.2

v1.13.1: Patch release

25 Aug 13:34
Compare
Choose a tag to compare

Fixed memory regressions

  • Remove _expand_inputs_for_generation for greedy search (#1266) @libinta
  • Fix memory regression for modeling llama (#1271) @libinta

FSDP

FSDP checkpoint saving is fixed.

Known limitations

  • ESMFold does not work on Gaudi1, this will be fixed in a future version

Full Changelog: v1.13.0...v1.13.1

v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example

16 Aug 14:25
Compare
Choose a tag to compare

SynapseAI 1.17

  • Upgrade SynapseAI version to 1.17.0 #1217

Transformers 4.43

Diffusers 0.29

  • Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 #1150 @dsocek

Stable Diffusion 3

Training with Sentence Transformers

Model optimizations

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

Stable Diffusion inpainting, unconditional image generation

  • Add the Stable diffusion inpaint support #869 @yuanwu2017
  • Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] #859 @cfgfung

Text feature extraction example

Tensor parallelism

  • Tensor parallel distributed strategy without using deepspeed #1121 @kalyanjk
  • Disable torch.compile for all_reduce when parallel_strategy is set to "tp" #1174 @kalyanjk

Kubernetes cluster example

  • Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster #1099 @dmsuehir
  • Fix PyTorch version in the Kubernetes docker-compose to match image #1246 @dmsuehir

FP8 training

Other

Known limitations

  • For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

v1.12.1: Patch Release

11 Jul 13:51
Compare
Choose a tag to compare

Fix 1st token latency time measure

Fix for Mixtral

Other

  • Fix for selective seq length test with batch size 1 #1110 @libinta

Full Changelog: v1.12.0...v1.12.1

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

22 Jun 18:28
Compare
Choose a tag to compare

SynapseAI v1.16

Transformers 4.40

Speculative Sampling

Model optimizations

Stable Video Diffusion

PEFT

TRL

Object Segmentation Example

  • Add an example of object segmentation (ClipSeg) #801 @cfgfung

Dreambooth

  • Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune #881 @sywangyi

Others